Skip to content

Implement Typed Documents and TypeRegistry #282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 85 commits into from
May 22, 2025
Merged

Implement Typed Documents and TypeRegistry #282

merged 85 commits into from
May 22, 2025

Conversation

jterapin
Copy link
Contributor

@jterapin jterapin commented Mar 6, 2025

Description: Implementation for Typed Documents and TypeRegistry. Currently only supports JSON documents.

It is highly likely that we have to revisit this implementation in the future since typed document/type registry still being evolved.

@jterapin jterapin changed the title Implement Typed Documents and TypeRegistry [WIP] Implement Typed Documents and TypeRegistry Mar 6, 2025
@jterapin jterapin changed the title [WIP] Implement Typed Documents and TypeRegistry Implement Typed Documents and TypeRegistry Apr 15, 2025
Copy link
Contributor

@mullermp mullermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Looking better. Still have a bunch of comments though .. In general I think we can still simplify and also be less aggressive on validation and instead be more permissive where possible.

# shape = Smithy::Schema::StructureShape.new
# data = Document::Data.new({ "name" => "example" }, shape: shape)
#
module Document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting document to be a class and have it be the delegator. What was the intention of making another data subclass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I recall - I believe I wanted to the Document (De)serializers to be nested under the Document namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back to Document class approach with some class methods.

# @param [Hash<String, Shapes::StructureShape>] registry
def initialize(registry = {})
@registry = registry
@shapes_by_type = register_shape_types(registry.values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to populate this from the code generated side? If we must iterate shapes, we may as well backtrack and populate both maps in one pass. That at least reduces generated code. Otherwise is shapes_by_type even necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can populate this from code generated side or build at runtime - depends on what we want. I'm not sure which will be preferred when given context is like EC2 for example.

From what I gathered, we are expected customers to use the type registry so I felt that hiding this detail is better. I guess at that point, you could say we might as well just accept an array of shapes so we can build the registry and shapes_by_type (previous approach before this iteration)

Any preference?

The shapes_by_type is necessary to find shape schema based on runtime shape class. Since runtime shape does not know its own shape schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - for now, I decided to just accept an array of shapes for now (just thinking about EC2 schema file, oof) We can do a switchroo later anyways - it's not firm.


# @api private
# @return [Hash<String, Shapes::StructureShape>]
attr_accessor :registry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these accessors shouldn't exist. Our public methods should hide this detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that makes sense. Hope the changes I made makes sense.

end

def typed_document?(values)
(values.is_a?(Smithy::Schema::Structure) && @type_registry.shape_by_type(values.class)) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already checked? And wouldn't this always be true if it was a structure, because it would already be registered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll refactor - now that unions are not included in type registry.

ref.shape.member(name) || find_member_ref_by_names(ref, name)
end

def find_member_ref_by_names(ref, name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems inefficient. Similar to what we do in codecs, for structure and union, you will want to iterate the shape members and not the values, then you can check json name that way. You're doing a loop for every member, so it's n^2 performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I streamlined closely to how a general codec serializer works. The updated approach does that but - we still need to do some checks since the given value key - could be a jsonName....symbolized key or memberName reflected on the modeled shape.

end
end

def resolve_member_name(member_ref, opts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out the location_name approach in my PR - we should use similar terms. You can easily handle this with || optionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, brought that over.

describe Serializer do
let(:shapes) do
shapes = SchemaHelper.sample_shapes
shapes['smithy.ruby.tests#Structure']['members']['timestampDateTime'] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if you move these definitions closer to the test (in the actual tests where they are needed) - it's easier to manage tests that way if they are discrete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair - I'll move it.

Copy link
Contributor

@alextwoods alextwoods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - its generally looking good.
I think the functionality from the Document::Data class could be moved into Document as a class (unless theres some reason I'm missing). I also understand why the Document serializer and deserializer exist separately and require a type registery - but I think I would lean towards the public interface for serializing/deserializing documents living on the top level class - it could still require a type registry to be provided and could use these classes under the hood to implement it (and they could then be api private).

@mullermp
Copy link
Contributor

mullermp commented May 9, 2025

That's effectively what I was also saying but I agree.

@jterapin jterapin mentioned this pull request May 20, 2025
Copy link
Contributor Author

@jterapin jterapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look over every comment that I wrote (since I like to write it as I go to remember where I left off) so if something doesn't make sense, let me know! Thanks!

# shape = Smithy::Schema::StructureShape.new
# data = Document::Data.new({ "name" => "example" }, shape: shape)
#
module Document
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I recall - I believe I wanted to the Document (De)serializers to be nested under the Document namespace.

module Schema
module Document
# Serializes data into a document data.
class Serializer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... Ideally, I would like that but for typed documents to work properly (deserializing nested typed documents) - it needs to reference a specific type_registry to (de)serialize properly.

This is prob why Dowling wanted TypeRegistry to handle these responsibilities.

Comment on lines 29 to 30
# serializer.create_document("some document")
# # => {"foo" => "bar"}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is wrong, I'll fix.

# # => an instance of Smithy::Schema::Document::Data
# document.discriminator
# # => "smithy.ruby.tests#Structure"
def create_document(data)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could create a class method under Document that references this file to handle (under the hood).

# @option opts [Boolean] :use_json_name Whether to use `jsonName` trait
# or just member name. The `jsonName` trait is ignored by default.
# @return [Hash] Serialized document data
def serialize_document(document, opts = {})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can make class methods at the Document level

ref.shape.member(name) || find_member_ref_by_names(ref, name)
end

def find_member_ref_by_names(ref, name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I streamlined closely to how a general codec serializer works. The updated approach does that but - we still need to do some checks since the given value key - could be a jsonName....symbolized key or memberName reflected on the modeled shape.

end
end

def resolve_member_name(member_ref, opts)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, brought that over.

Comment on lines 402 to 405
# TODO: failing due to synthetic shape changes
# it 'contains a registry of typed shapes' do
# expect(subject.keys).to match_array(typed_shapes.keys)
# end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out failing test due to synthetic input/output shape id changes


class << self
def type_registry
Smithy::Schema::TypeRegistry.new([CityCoordinates, CitySummary, GetCityInput, GetCityOutput, GetCurrentTimeOutput, GetForecastInput, GetForecastOutput, ListCitiesInput, ListCitiesOutput, NoSuchResource])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could go back to doing map, etc but this will increase file size for services like ec2

@jterapin jterapin requested review from mullermp and alextwoods May 21, 2025 04:00
@mullermp mullermp merged commit 97b7f61 into decaf May 22, 2025
17 checks passed
@mullermp mullermp deleted the typed_documents branch May 22, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants